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ABSTRACT 

There have long been threads of investigation into covert 
channels, and threads of investigation into anonymity, but 
these two closely related areas of information hiding have not 
been directly associated. This paper represents an initial in- 
quiry into the relationship between covert channel capacity 
and anonymity, and poses more questions than it answers. 
Even this preliminary work has proven difficult, but in this 
investigation lies the hope of a deeper understanding of the 
nature of both areas. MIXes have been used for anonymity, 
where the concern is shielding the identity of the sender 
or the receiver of a message, or both. In contrast to traffic 
analysis prevention methods which conceal larger traffic pat- 
terns, we are concerned with how much information a sender 
to a MIX can leak to an eavesdropping outsider, despite the 
concealment efforts of MIXes acting as firewalls. 

Categories and Subject Descriptors 

H. l.l [Models and Principles]: Systems and Information 
Theory — Information theory 

General Terms 

Theory 
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I. INTRODUCTION 

In this paper we discuss a particular covert channel that 
exists in an anonymizing network. We discuss how less than 


Copyright 2003 Association for Computing Machinery. ACM acknowl- 
edges that this contribution was authored or co-authored by a contractor or 
affiliate of the U.S. Government. As such, the Government retains a nonex- 
clusive, royalty-free right to publish or reproduce this article, or to allow 
others to do so, for Government purposes only. 

WPES’03, October 30, 2003, Washington, DC, USA. 

Copyright 2003 ACM 1-58113-776-1/03/0010 ...$5.00. 


Richard E. Newman 
University of Florida 
CISE Department 
Gainesville, FL 32611-6120 

nemo@cise.ufl.edu 


Allen R. Miller 

Private Consultant 
Washington, DC 


perfect anonymity can inadvertently introduce covert com- 
munication channels. We do not discuss “fixes” to the covert 
channel problem as has been done in traffic analysis of net- 
work communications [16, 17, 26, 27, 28]. Rather, our in- 
terest is in measuring the covert channel capacity. These 
results can assist in bounds for covert channels, and lead 
one to consider different, or modffied, design scenarios. Note 
that even though some may consider studying covert chan- 
nels as being overly paranoid, covert channels should not be 
ignored [13] (a good starting place for the reader unfamiliar 
with covert channels). 

We present some simplified scenarios as a first step in this 
analysis. Unfortunately, the mathematical details of the re- 
sults showcased in this paper are quite complicated and de- 
tailed. Therefore, in the interest of writing a proceedings 
size paper, we have delegated the lengthier mathematical 
details to the internal (publicly available) tech report [14]. 
We have included the mathematical and information theo- 
retic details for the simpler cases in this paper, in the hopes 
of giving the reader a taste for the more complex cases. We 
thank a reviewer for pointing out [1, 6, 11], where some 
informal studies of covert channels and anonymity were dis- 
cussed. 

There is always one special transmitting node in a net- 
work called Alice. Alice and possibly other transmitters 
have legitimate business transmitting messages to a set of 
Receivers { 7?i | z = 1,2,..., M}. These transmitters act com- 
pletely independently of one another, and have no direct 
knowledge of each other’s recent transmission behavior. Al- 
ice may have some general knowledge of the long-term traffic 
levels produced by the other transmitters, e.g., the num- 
ber of other transmitters and their probabilistic behavior, 
which can allow Alice to write a code that can improve the 
covert communication channel’s data rate. She cannot, how- 
ever, perform short-term adaptation to their behavior. Our 
simplified communication is one-way (transmitters are never 
receivers). We also assume that there is a clock, and that 
transmissions only occur in the unit interval of time called 
a tick. Any subset of transmitters can each either send a 
single message to a single receiver in a tick, or not send a 
message at all. Each transmitter in a tick can send to a dif- 
ferent receiver, and two or more transmitters may send to 


the same receiver in the same tick. All messages’ contents 
are encrypted end-to-end. 

There is also an eavesdropper on the network called Eve. 
Since all transmissions are encrypted, they appear to the 
eavesdropper Eve as having indistinguishable content. Eve 
may be either a global passive adversary (GPA), with the 
ability to see link traffic on every link in the network, or 
a restricted passive adversary (RPA), with the ability to 
observe traffic only on certain links. 

Alice is not allowed any direct communication with Eve. 
However, Alice can influence what Eve sees on the network. 
We study network scenarios that attempt to achieve a degree 
of anonymity with respect to the network communication. 
That is, the networks are designed with various anonymity 
devices to prevent Eve from learning who is sending a mes- 
sage to whom. Even if a certain degree of anonymity is 
achieved, it still may be possible for Alice to communicate 
covertly with Eve. Note anonymous communication net- 
works were not designed with this covert channel threat in 
mind. Our study of these anonymity networks caused us 
to realize that even in what appears to be a benign form 
of communication, information may still leak out of the net- 
work. This may cause the system designer to rethink and/or 
modify their ideas. 


Figure 1: Restricted Passive Adversary Model. 

The main thrust of this paper is to analyze the situation 
where there are two enclaves, communication between them 
is encrypted, and packets are sent only from the first enclave 
(which contains Alice) to the second (Fig. 1). Eve is able 
to monitor the communication from the first enclave to the 
second. Anonymity is “achieved” in that an eavesdropper 
such as Eve (as RPA) does not “know” who is sending a 
message (that is hidden inside of the first enclave) nor who 
is receiving the message (this can only be known if one is 
interior to the second enclave). Eve is only allowed to know 
how many messages per tick travel from the first enclave 
to the second. Nonetheless, Alice attempts to communicate 
covertly with Eve. 

This paper analyzes the covert communication channel 
from Alice to Eve. We show that even if anonymity is taken 
into consideration with respect to system design, covert chan- 
nels may remain. As a baseline, we first consider situations 
in which no attempt at anonymity has been made (only 
encryption of the messages, so that they all appear to be 
identical to an eavesdropper). Later, we will consider covert 
channel capacity in networks with the stronger anonymity 
controls just described. 


2. BASE SCENARIO — NO ANONYMITY 

One transmitter 


Eve 


Alice 


Figure 2: Global Passive Adversary Model. 

Alice is the only transmitter, and there are M possible 
receivers. Eve has knowledge of the network traffic (Eve 
is a GPA — see Figure 2). The only properties that Eve 
can discern from a message is its source (trivially Alice) and 
its destination. Alice can use that fact to send information 
covertly to Eve. In this simplistic scenario Eve can see if 
Alice is sending a message, and if Alice is sending a message 
Eve can determine for which receiver the message is meant. 
This gives Alice the ability to signal Eve with an alphabet of 
M+ 1 symbols: M symbols for the M different receivers, and 
one symbol (“0”) for the choice of not sending a message. 

Since nothing is able to interfere with Alice’s transmission, 
we have a noiseless discrete memoryless channel (DMC) 
modeling the covert channel, whose capacity is log(M + 1) 
bits per tick. 1 

Several transmitters 

Now, if there are other transmitters aside from Alice, but 
their transmissions to any of the M receivers do not affect 
Alice’s transmissions, then the covert channel from Alice to 
Eve is as above. This would be the case if the links into 
a receiver can handle all of the traffic meant for them. Of 
course, if the link capacity into a transmitter does affect 
the number of receivable transmissions then that introduces 
noise into the channel and the capacity is obviously less than 
log(M + 1). This is a course of research worth pursuit. 

Anonymity discussion 

In the above scenario Alice can obviously leak considerable 
information to Eve. This is no secret to the anonymity com- 
munity, e.g., [2, 3, 4, 5, 8, 18, 19, 22, 23] (while the preced- 
ing list is only a representative sample of papers/URLs on 
the topic, these papers relate particularly well to what we 
discuss in this paper). However, in the past the concerns 
have focused on retaining or regaining anonymity. It is the 
“anonymity lost” that we exploit for covert communication. 
If there were “per/eef” anonymity, 2 then we would not ex- 
pect to find a covert channel. 

X A11 logarithms are base 2, the units of capacity are bits per 
tick. 

2 We intentionally leave the notion of perfect anonymity 
as fuzzy in this paper. We ponder the somewhat circular 
question: If we did have perfect anonymity, how could we 





To provide anonymity, transmissions from a transmitter 
are often first sent to an intermediary, such as a MIX [5] 
or an onion router [18], before they are forwarded to the 
receiver. This has the effect of hiding where the message 
is going. Thus, these intermediaries serve to anonymize the 
transmission. Of course, Eve still knows the set of those who 
receive a message, and she also knows the set of those who 
sent a message, but she does not know who sent a message 
to whom. It is interesting that, even when we seem to have 
“good” statistical anonymity, Alice may still non-trivially 
be able to communicate covertly with Eve. 

The use of a MIX alone does not prevent Alice from covert 
communication with Eve. In fact there are two possible 
situations when Alice is the only transmitter. 

1. Alice signals Eve by sending or not sending a message. 
A MIX alone does nothing to prevent Eve from learn- 
ing this information (this is not what a MIX is designed 
to do) . We discuss this further at the beginning of the 
next section. Therefore Alice has a noiseless channel 
to Eve, with capacity = 1. 

2. Alice signals Eve by sending a message to any one of M 
different receivers. Eve simply sees where messages are 
going when they leave the MIX (a concern well-known 
to MIX designers). This allows a covert channel with 
a capacity of log(A/ + 1). If there are other users, their 
behavior affects what Eve is receiving and the capacity 
is then less than log(M + 1). 

We will not study the latter situation in this paper, be- 
cause we do not use pure MIXes. Instead, we use MIXes 
acting as firewalls. 

3. SCENARIO 2: 

INDISTINGUISHABLE RECEIVERS- 
2 MIX-FIREWALLS 

Consider the situation in which every message goes into 
the anonymizing intermediary referred to as a MIX [5]. The 
MIX has the effect of hiding the “linking” knowledge of 
which transmission is sent to which receiver. In other words, 
Eve knows who is transmitting and who is receiving, but in 
general, Eve does not know which transmitter is sending to 
which receiver. This assumes that Eve is a GPA. Of course, 
if only one transmitter is operating then the MIX hides noth- 
ing. In other words the MIX gives statistical anonymity. 
The amount of anonymity has been measured as the log of 
the number of transmitters ( anonymity set size), sometimes 
in conjunction with probabilistic behavior (e.g., [3, 4, 5, 8, 
23]). 

The main concern of this paper is not with measuring 
anonymity, rather it is the amount of covert information 
that may be leaked through less than perfect anonymity. 
However, we do note the very important observation from 
our study: the ability to covertly communicate arises due 
to a lack of anonymity. As the number of transmitters 
goes up and as the transmitters behave in a “uniform (equi- 
probabilistic) manner,” the anonymity increases and we will 
show that the covert channel capacity diminishes. 

have covert communication? We thank P. Syverson for his 
thoughts. 


For Scenario 2 we assume that there are transmitters Al- 
ice and Cluelessi,* = 1, The N Cluelessi transmit- 

ters behave independently of each other and of Alice, and 
they all have the same time-invariant probabilistic behavior. 
Throughout this paper we assume that Alice acts indepen- 
dently of the Cluelessi. Alice and the Cluelessi are hidden 
from Eve. They submit their messages to a MIX that also 
functions as a firewall. This first MIX-firewall acts as an 
exit point. This MIX-firewall sends its encrypted messages 
to a second MIX-firewall that is an entrance to a second hid- 
den (from Eve) enclave. We further assume that Eve only 
has knowledge of how many messages come out of the first 
MIX-firewall per tick, and Eve does not know to whom the 
messages are going. Thus Eve is an RPA. The situation is 
described by the following diagram (Figure 3). This situa- 



Figure 3: MIX-firewalls with Restricted Passive Ad- 
versary. 

tion is realistic 3 if the MIXes are acting as (first) firewall exit 
and (second) entrance points, or if the MIXes are onion-type 
routers acting as firewalls. Therefore, the only knowledge 
that Eve can get by eavesdropping is the number of mes- 
sages per tick passing between the two MIX-firewalls. In 
other words, every tick, Eve observes the number of packets 
leaving the MIX-firewall and “receives” some number from 
the set {0, 1, • • • , N + 1}. 

Therefore the only quantity observable by Eve that Al- 
ice can affect, per tick, is the number of messages that Eve 
counts. This covert channel is a discrete memoryless chan- 
nel with noise since the Cluelessi randomly affect the out- 
put. Shannon’s information theory [24] tell us how useful 
the channel is. 

Let us go back to the base scenario; here we stated that 
the capacity is obviously log (M + 1). How do we know that 
some other exploitation of the base scenario will not give 
us a higher capacity? The reason is that there are at most 
M + 1 symbols in whatever exploitation we use, and if the 
channel is noiseless we have maximized the capacity (this is 
related to the maximum entropy as discussed in [15].) For 
Scenario 2 capacity cannot be explained so easily and is the 
major study of this paper. 

Keep in mind that for Scenario 2 it does not matter if there 
is one receiver or there are one hundred and one receivers. 
Eve can only count, and Alice or Cluelessi can only send 

3 Consider the case of packets from one LAN/enclave be- 
ing sent to another LAN/enclave using IPSEC tunneling 
[10]. In this case, an eavesdropper can only count the num- 
ber of outgoing messages destined for the receiving enclave. 
What goes on inside each LAN/enclave is hidden from an 
eavesdropper. If UDP with no application level ACKs is 
employed, communication is only one-way [20]. 





one message per tick. Therefore the number or receivers 
does not matter. It is only important that there is at least 
one receiver. 

We break Scenario 2 down into four cases: 2.0, 2.1, 2.2, 
and 2.3. Case 2.3 is the general form of Scenario 2 and the 
first three are simplified special cases. 

3.1 Two special cases of Scenario 2: — Alice 
alone, and with and one additional trans- 
mitter 

Case 2.0 — Alice 

This is the case where N = 0. Alice is the only transmitter. 
Alice sends either 0 (by not sending a message) or 0 C (by 
sending a message). Eve receives either eo = 0 (Alice did 
nothing) or ei = 1 (Alice sent a message to a receiver). The 
capacity of this noiseless covert channel is 1. 

Note though the capacity is the maximum, over the prob- 
ability x for Alice inputting a 0, of the mutual information 
I(E,A). A is the distribution for Alice described by x, and 
E is the distribution for Eve. Since there is no noise, I is sim- 
ply the entropy H(E) describing Eve (which is maximized 
to 1 when x = .5). 

I(E, A) = H(E) = — xlogx — (1 — x) log(l — x). 

These terms are made precise later in this section. 

Case 2.1 — Alice and one additional transmitter 
(Clueless) 

In this case N = 1. Therefore, Eve receives: 



anonymizing 



network 



(a) Channel block diagram 
p - 0 



(b) Channel transition diagram 


Figure 4: Channel model for Case 2.1 


Part (b) of Fig. 4 shows the output symbols corresponding 
to the three states E might perceive. Let us consider the 
channel matrix. 


M 2.1 


0 12 
0 ( p q 0\ 

0 C ^ 0 a f3) 


• 0 if neither Alice nor Clueless transmit; 

• 1 if Alice does not transmit and Clueless does transmit, 
or Clueless transmits and Alice does not; or 

• 2 if both Alice and Clueless transmit. 


The 2x3 channel matrix A/ 2.1 [i, j ] represents the conditional 
probability of Eve receiving the symbol j when Alice sends 
the symbol i. It follows that p = a, and thus it trivially 
follows that q = (3. 

So our channel matrix simplifies to: 


A is the input random variable describing Alice, and E is 
the output random variable describing Eve. Clueless con- 
tributes to the noise, but is not modeled as an input. Alice 
communicates with Eve via the covert channel. The input 
symbols for the channel are 0, which signifies that Alice is 
not transmitting a message to any receiver, and 0 C , which 
signifies that Alice is transmitting a message to some re- 
ceiver (keep in mind that Alice is oblivious to the other 
transmitters). 4 

4 At this point we caution the reader not to confuse Alice 
transmitting a message to a receiver Ri, and Alice com- 
municating to Eve via the covert channel. Eve is not the 
receiver Ri in the sense of Alice or Clueless transmitting a 
message. Eve receives symbols via the covert channel from 
Alice. There are two different communication paths that 
must be kept separate. One is the legitimate network com- 
munication that the anonymizing device attempts to keep 
unknown. The other is the covert communication that Al- 
ice has to Eve. A way to stop the covert communication 
would be for the anonymizing device to pad [15, 16, 17, 26, 
27] messages so that it would appear to Eve that both Alice 
and Clueless are transmitting a message. This inefficiency 
might be tolerated in such an ideal situation as Case 2.1, but 
such a strategy must be called into question when it comes 
to real traffic. In Case 2.1 the anonymizing effect is done 
by a MIX-firewall, which does not a priori pad. Of course, 
before advocating traffic padding one should be fully aware 
of the threat that the padding is intended to stop. Failure 
to understand the threat first is inadvisable since padding 
comes at the pragmatic costs of efficiency and proper net- 
work resource utilization. 


0 12 

0 f p q 0 \ 

0 C \0 p q)' 

The probability that Alice sends a 0 is P(A = 0) = x, 
and therefore P(A = 0 C ) = 1 — x. The term x is the only 
term that can be varied to achieve capacity. Here is where 
Alice may use knowledge of long-term transmission char- 
acteristics of the other transmitters, as well as how many 
other transmitters there are, to change her (long-term) be- 
havior. As with other studies of covert channels [13] we are 
not concerned with source coding/decoding issues [24]. Our 
concern is the limits on how well a transmitter can “opti- 
mize” its bit rate to a receiver, given that a channel is noisy. 
Given a discrete random variable A', taking on the values 
Xi, i = 1 , . . . , nx, the entropy of X is: 

n x 

H{X) = — p{xi) log p(Xi) . 

i= 1 

We use p(xi) as a shorthand notation for P(X = Xi). Given 
two such discrete random variables X and Y we define the 
conditional entropy (equivocation) to be: 

ny nx 

H(X\Y) = - ^p{y i )^p(x j \yi)\ogp(x j \yi) . 
i= 1 j = 1 

Given two such random variables we define the mutual in- 




formation between them to be: 

I(X,Y) = H{X)-H(X\Y) . 

Note that H(X) - H(X\Y) = H{Y ) - H(Y\X), so we see 
that I{X,Y) = I(Y,X). 

For a DMC whose transmitter random variable is X, and 
whose receiver random variable is Y, we define the channel 
capacity [24] to be: 

C = max I{X,Y), 

where the maximization is over all possible distribution val- 
ues p(xi) (that is, the p(xi ) are all non-negative and sum to 
one) . 

For us, the capacity of the covert channel between Alice 
and Eve is 

C = ma x{H(E) - H(E\A)}. 

X 

Given the above channel matrix we have: 

H(E) = — {pxlogpx 

+[qx + p( 1 — a;)] log [qx + p(l — x)] 

+ q{ 1 - *) log <?(1 - *)}. 

1 2 

and H(E\A) = -’^2p(a i )'^2p(ej\ai)\ogp(ej\ai) = h(p) . 

i= 0 j= 0 

Where h(p ) denotes the function — p\ogp— (1 — p) log(l— p). 
Thus, 

— (px log px 

C = max Y[qx + p{l — *)] log[qa: + p(l — *)] . 

+<7(1 - x) log g(l - x)) - h(p) 

We cannot analytically find the x that maximizes the mutual 
information, even doing the standard trick of setting the 
derivative of the mutual information to zero. However, we 
numerically show our results in Figure 5. 



Figure 5: Plots of covert channel capacity as a func- 
tion of p, and of the x value that maximizes the 
mutual information as a function of p. 


We see in Figure 5 certain symmetries. The capacity 
graph is symmetric about p = .5, and the graph of the x 
that achieves capacity is skew-symmetric about p = .5 

Consider the two situations where p = t, and where p = 
1 — e; in both situations 0 < t < .5. Let x e be the proba- 
bility for the input symbol 0 that achieves capacity in the 
first situation, and let xi- e be the probability that achieves 
capacity for the second situation. For the first situation we 
have that 1 — x c is the capacity achieving probability for 
the output symbol 0 C , and similarly for the second situation 
1 — xi-e is the capacity achieving probability for the output 
symbol 0 C . Physically the two situations are “the same” if 
we reverse the roles of the outputs symbols 0 and 2. There- 
fore Xe = 1 xi- £ . Writing x e as x e = ^ + A, we see that 

Xi-e = 5 — A; this is what the lower dotted plot shows in 
Figure 5 (e = 1/2 =+ A = 0). 

Observation 1. In conditions of very little extra traffic, 
or very high extra traffic, the covert channel from Alice to 
Eve has higher capacity. 

Observation 2. The capacity C(p), as a function ofp is 
strictly bounded below by C(. 5), and C(. 5) is achieved when 
the mutual information is evaluated at x = .5. 

It is obvious that very little extra traffic corresponds to 
very little noise. At first glance though, it seems counterin- 
tuitive that heavy traffic also corresponds to a small amount 
of noise. This is because the high traffic is used as a baseline 
against which to signal. This is analogous to transmission 
of bits over a channel where the bit error rate (BER) P e is 
greater than 1/2. In this case, the capacity of the channel is 
the same as that of a channel with BER of 1 — P e , by first 
inverting all the bits. It is the in-between situations that 
negatively affect the signaling ability of Alice. But, even in 
the noisiest case (i.e., where p = .5) Alice can still transmit 
with a capacity of a half bit per tick. 

Note that we can never guarantee error-free transmission, 
no matter how we group the output symbols. In fact, it 
is possible that the outputs will always be the symbol 1 (of 
course the probability of this quickly approaches zero, as the 
number of transmissions goes up). So this covert channel 
has a zero-error capacity [25] of zero. Capacity is a useful 
measure of a communication channel if the assumption is 
that the transmitter can transmit a large number of times. 
With a large number of transmissions, an error-correcting 
code can be utilized so as to achieve a rate close to capacity. 
If the transmitter only transmits a small number of trans- 
missions, then using the capacity alone can be misleading. 

3.2 Case 2.2 — Alice and two additional trans- 

mitters (n = 2) 

This is similar to Case 2.1, the difference being that we 
have three possible transmitters, A (random variable as be- 
fore) for Alice, who is attempting to communicate covertly 
with E (random variable as before) for Eve, and two other 
benign “clueless” transmitters. Since the MIX-firewalls only 
allow Eve to count the number of outgoing messages, our 
covert channel has four possible output symbols (the inputs 
are as before 0, for Alice not sending a message, and 0 C , if 
Alice does send a message). The outputs are: 

• 0 — No one sends a message; 




• 1 — Alice sends a message, and neither Clueless; send 
a message; or, Alice does not send a message, and one, 
and only one, Clueless; sends a message; 

• 2 — Alice sends a message and one, and only one, 
Clueless; sends a message; or, Alice does not send a 
message and both Clueless; send a message; 

• 3 — Alice, Cluelessi , and Clueless 2 all send a message. 


As stated earlier we assume that Cluelessi and Clueless 2 
act independently of each other (and Alice is independent 
of them). Therefore, if, as before, p is the probability of a 
clueless transmitter (Cluelessi or Clueless 2 ) not sending a 
message into the MIX-firewall, and q = 1 — p is the proba- 
bility of a clueless transmitter sending a message, the con- 
ditional probabilities of E given Alice sending 0 are show in 
the covert channel diagram and channel matrix in Figure 6. 
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(a) Channel transition diagram 


A/2.2 


0 12 3 

0 (p 2 2 qp q 2 0 \ 

0 C \ 0 p 2 2 qp q 2 ) 


(b) Channel matrix 


Figure 6: Channel for Case 2.2. 
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Figure 7: Capacity as a function of p for Alice with 
two additional transmitters. 


3.3 Case 2.3 — Alice and N additional trans- 
mitters 

Case 2.3 is the general form of Scenario 2, see Figure 8. 
Now 5 we imagine that there are N + 1 transmitters, Alice 
is one of them, and the other N are all independently iden- 
tical clueless transmitters. That is, there are transmitters 
Cluelessi, Clueless 2 , . . ., Clueless at. Again, Eve can only see 
how many messages are leaving the first MIX-firewall headed 
for the second MIX-firewall. Therefore Eve can determine if 
there are 0, 1, . . . , N + 1 messages leaving the firewall. That 
is all Eve can determine. Therefore, there are still the two 
input symbols ao = 0 and ai = 0 C , but we have N + 2 
output symbols. The probability that Clueless; does not 
send a message is still p, and that it does send a message is 
q = 1 — p. Now, calculate the channel matrix. Keep in mind 
that Alice acts independently of the Clueless;. 

Alice sends a 0. 


We can easily observe that the zero-error capacity is zero 
because the output symbols 1 and 2 can both be received 
if 0 or 0 C is transmitted. Therefore there is always some 
statistical error in what is received. This is similar to Case 
2.1. For capacity itself, after some numerical calculation we 
plot the capacity in Fig. 7. 


• For Eve to receive (that is E = k), 0 < k < N we 
need k of the clueless transmitters to send a message, 
and N — k not to send a message. Therefore, 

p(ek\A = 0) = (^f\p N k 1 k ’ 0 < k < N. 


Except for the boundary values, the capacity is always less 
for a given p with three transmitters (two clueless) than with 
two (one clueless). This is not surprising, the extra clueless 
transmitter means extra noise. Note that the noisiest case 
is when p = .5 , which again acts as a lower bound. 

Unfortunately we cannot derive closed form solutions even 
for these simple cases. Therefore, it seems unlikely that we 
can derive a closed form for the general case of N clueless 
transmitters in addition to Alice. Of course, we could still 
derive the capacity numerically. However, we are able to 
obtain some bounding results. 


• p(e N +i\A = 0) = 0. 

Alice sends a 0 C . 

• p(eo | A = 0 C ) = 0, since the event never happens. 

• For Eve to receive ek (that is E = k), 1 < k < N + 1 
we need k — 1 of the clueless transmitters to send a 
message, and N — k + 1 not to send a message. 

jp JV_fc+1 (? fc “ 1 , 1 < k < N+ 1. 

We delegate to the appendix the outline of the following 
important results (the full details and proofs are in [14]). 

5 One could relax the assumption that all the Clueless; have 
identical and independent behavior. 


p{e k \A = 0 C ) = 





(a) Channel transition diagram 
The channel matrix M3 . at is 

0 1 2 ... N N + 1 

0 / p N Np N ~ 1 q (f )p N ~ 2 q 2 ... q N 0 \ 

0 C V 0 p N Np N ~ 1 q ... Npq N ~ 1 q N ) 

(b) Channel matrix 

Figure 8: Channel for Case 2.3, the general case of 
N clueless users. 

• For any p, C(p) is strictly bounded below by C(.5). 

• As the number of clueless transmitters goes to infinity, 
C(.5) goes to zero. 

• C(p) is a continuous function of p. 

4. COMMENTS, GENERALIZATIONS & FU- 
TURE WORK 

We first note that despite the obfuscation provided by 
MIX-firewalls, and the attendant noise introduced by other 
transmitters, Alice is still able to transmit information to 
Eve. At this point, we recall our earlier observations and 
add to them below. 

1. In conditions of very little extra traffic, or very high 
extra traffic, the covert channel from Alice to Eve has 
higher capacity. 

2. The capacity C (p ) , as a function of p is strictly bounded 
below by C(. 5), and C(. 5) is achieved when the mutual 
information is evaluated at x = .5 (of course p = .5 also 
in this situation). 

3. The capacity C (p ) , as a function of p is strictly bounded 
below by a function that decreases monotonically to 
zero as the number of transmitters increases, but is 
never zero. 

4. The bias in the code used by Alice to achieve the op- 
timum data rate on the channel is not always x = 0.5, 
but it is never far from 0.5, and our preliminary exper- 
imental results indicate that the difference in capacity 
is minor. 



Figure 9: Exit firewall only 

This last observation agrees with [12], which presents the 
general result that in DMCs, mutual information bit rates 
obtained by using x = .5 is no less than 94.21% of the chan- 
nel capacity. Even if Alice has no knowledge of the proba- 
bilistic behavior of the other transmitters, her data rate will 
not be too far from optimal if she uses an unbiased code. 
(Note, however, that the coding rate is very much depen- 
dent on knowledge of the number of other transmitters and 
their behavior.) 

In future work we will also analyze the situation where we 
have only an exit point MIX- firewall as shown in Figure 9. 

We have M receivers denoted Ri , . . . , Rm- Eve still does 
not know directly who sent a message, but Eve does know 
where messages are going. This increases the capacity of 
the covert channel. Alice now instead of just sending 0 or 
0 C can send: 0 (not transmitting); 1 (message to the first 
receiver), ... ,i (message to the ith receiver, ... , M (message 
to the Mth receiver). The greatest the capacity can be is 
log(M +1). Of course if M = 1 the situation reduces to 
Scenario 2. 

(See [14] for other related scenarios.) 

Other areas begging for further investigation include sce- 
narios in which there is limited network capacity (on links or 
aggregate), whether or not there is anonymity. We are cur- 
rently investigating this using the model in which at most B 
messages can be sent through the network (as output from 
a sender of as output of a MIX-firewall) in a given tick, and 
if there are more than B messages awaiting transmission, B 
of them are chosen at random for delivery. This may relate 
the work to more sophisticated MIX models, such as pool 
MIXes, which is also desirable. 

A deeper issue raised in this preliminary paper is that 
of the relationship between anonymity and covert channel 
capacity (fixing the other factors that affect capacity). It 
seems evident that as system level anonymity increases in 
the simple models shown here (i.e., the number of poten- 
tial senders increases), the minimum capacity decreases to 
zero. However, as the probability that a Clueless sender 
transmits in a given tick increases, the expected number of 
actual senders in a given time tick also increases, hence the 
anonymity increases, but the capacity of the covert channel 
increases once this probability exceeds 0.5. The relation- 
ships are not simple, but their discovery has the potential 
to increase our understanding of fundamental aspects of net- 
work design. 
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APPENDIX 
A. APPENDIX 

Now we show that C(. 5) is a strict lower bound for C(p), 
and that as the number of clueless transmitters goes to in- 
finity that C(. 5) goes to zero. We also discuss a continuity 
result for C(p). Now we continue with the general case 2.3. 

Since p(ejt) = p(ek\A = 0)P(A = 0) +p(ek\A = 0 c )P(A = 
0 C ), we have that 
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Consider the entropy of E evaluated when p = i . 
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Consider the conditional entropy when p = I. 


The mutual information is 
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(For Case 2.1 (one Clueless in addition to Alice) and for 
Case 2.2 (two clueless in addition to Alice) we discussed the 
symmetry about p = .5 informally.) 


Theorem 1. I(E,A)\ XyP = I(E,A)\ 1 - Xyq 


PROOF: See [14] 


H{E\A)\ p= . 6 = N- 



Note that H{E\A)\ p =.s is independent of x. Keep in mind 
that we may express the mutual information evaluated at 
(x',p') by the slightly overloaded notation I(E, A)\ x=x i tP=p i . 
Of course I(E,A)\ p=p / is simply just a function of x, and 
I(E,A)\ X=X > is a function of p. 


Definition 1. We say that an arbitrary (real valued) func- 
tion is not locally- constant iff for all x with f(x) defined at x, 
and for every <5 > 0, there exists an x' such that d(x' , x) < 5 
(i.e., x' in the neighborhood of x) with f{x') ^ f{x). 


That is, for no neighborhood, no matter how small, is the 
function constant. 


Definition 2. We say that a function / : [0, 1] — > SR is 
symmetric about x = .5, iff f(x) = /(I — x). 

Observation 3. If f(x) is symmetric about x = .5 and 
it is concave down (convex up) then /(. 5) is a maximum 
(minimum) value. Further, if f(x) is not locally- constant 
then .5 is the only such critical point. 

Theorem 2. I(E,A ) \ P =.5 is symmetric about x = .5. 
PROOF: By Thm. 1, I{E,A)\ X ,. B = I(E, A)| 1 _ a , l . 5 . 

Theorem 3. C(.5) = I(E,A)\ X= . 5:P= . 5 . 

PROOF: By Theorem 2, we know that I(E , A)| p= .s is sym- 
metric about x = .5, and [9] [Thm. 4.4.2]&[7][Thm.2.7.4] 
show that I(E, A) | p =.5 (and in general I(E , A ) for fixed p) is 
concave down. Therefore, from Observation 1, I(E,A) \ p =.s 
obtains its maximum value when x = .5. Since capacity, 
when p = .5, is the maximum of I(E,A) | p =. 5 , we are done. 

Theorem 4. C(p) > I(E, A)\ X= . 5}P= . 5 . 


We will need the following in the rest of the appendix PROOF: By definition C(p) > I(E,A) \ x= .s, since capacity 

so we will consider I(E,A) | p =.5 = H(E ) P= . 5 — H(E\ A ) p= . 5 is the maximum of the mutual information. For x fixed, 

I(E, A)\ x is a convex up function of p (see [9] [Thm. 4.4.2] 


now. 



and [7][Thm.2.7.4]). By Thm. 1 we see that I(E,A)\ x= .s 
is symmetric about p = .5. By Observation 3 we see that 
I(E, A)\ x —.5 > I(E,A) |a;=.5,p=.5. 

This allows us to use I(E, A)\ x =.s , p =. 5 (simple single value) 
as a lower bound for the covert channel capacity. 

Corollary 1. C(p ) > C{. 5) 

PROOF: Apply Theorems 3 and 4 together. 

Theorem 5. C(p) = C(1 — p) and if x p is the unique x 
such that C(p) = I(E, A) \ Xp , p , then x\- p = 1 — x v . 

PROOF : This trivially follows from Thm. 1 and the unique- 
ness (follows from the concavity properties and the fact that 
the mutual information is not locally-constant) of the criti- 
cal x value. 

Let us now use these results to bound capacity from below. 
After many calculations and simplifications [14] we obtain 



We show some numerical results for C. 
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C(.5) = lower capacity bounds for all p, N = 1, . . . 25 

Note that in the general circumstances of Case 2.3, if p = 0 
(or similarly q = 0), we have a noiseless channel and the 
capacity is one, which is achieved when x = .5. So we see 
that 1 is a tight upper bound for the capacity. Therefore we 
have the following result: 

For Alice and N(N > 0) transmitters: C(.5) < C(p) < 1 

and bounds ON C(p) are tight. Of course keep in mind the 
result from Case 2.0: 

For Alice and no additional transmitters: Capacity = 1. 

As N grows so does the noise. Therefore, we see that the 
capacity is non-increasing. We are interested in the lower 
bound C(.5). We have numerically calculated C(. 5) to N = 
7750 and have shown that C(.5) is monotonically decreasing 
to zero (for N=7750, C(.5) = .000093). We can (but do not 
since it is many pages in length) analytically show C(.5) is 
monotonic decreasing. That is not surprising since increas- 
ing the number of clueless users increases the noise, but it 


is surprising that it is so difficult to show that C(.5) goes 
to zero as N goes to infinity. Below we discuss that fact, 
leaving the interesting and subtle details to [14]. 

^From Eq. 1 we can express C(.5) as 

C(.5) = l-(i) S(N), 

where 

Theorem 6. S(N) = 2 Jv log(AT+l)-^( ^ J log(fc + l) 

k=o\ K / 

PROOF: Not shown, basically involves combinatorial iden- 
tities. 

Keep in mind our goal is to study the behavior of C(.5) 
as N — > oo. However, first we need a technical lemma. 

Lemma 1. ^ = f or P < N, where 

Q P (N ) is a monic polynomial in N of degree p. 

PROOF: Follows from [21, Formulas 1,2,7,8,9,10 p. 608]. 
Theorem 7. lim C(.5) = 0 . 

N — *oo 

PROOF: The proof is asymptotic in nature, but follows by 
applying Lemma 1 to Thm. 6. 

A.l Continuity 

For Scenario 2 we wished to say that capacity was a con- 
tinuous function of p. We thought that we could just use 
some standard information-theoretic result. Unfortunately, 
we could not find such a result. We do not think that it 
would be too hard to argue from the various concavity prop- 
erties of mutual information that C(p) is a continuous func- 
tion (of p). However, we decided to present a more general 
result which relies on the following theorem. 

Theorem 8. Let F(x,p) be a continuous function 6 de- 
fined on [0, 1] x U, U an arbitrary subset of the reals, and 
assume that for each fixed p, F(x,p) achieves a maximum 
denoted as T(p). Then F(p) is a continuous function of p. 

PROOF: Not shown — standard analysis result using com- 
pactness arguments. 

We believe that continuity results such as these are im- 
portant, but they seem to be overlooked in the literature. 
Note we can replace the closed interval [0, 1] by any compact 
subset of the reals. 


6 Of course in this paper all functions are real valued. 






























